33 research outputs found
Measuring, Characterizing, and Detecting Facebook Like Farms
Social networks offer convenient ways to seamlessly reach out to large
audiences. In particular, Facebook pages are increasingly used by businesses,
brands, and organizations to connect with multitudes of users worldwide. As the
number of likes of a page has become a de-facto measure of its popularity and
profitability, an underground market of services artificially inflating page
likes, aka like farms, has emerged alongside Facebook's official targeted
advertising platform. Nonetheless, there is little work that systematically
analyzes Facebook pages' promotion methods. Aiming to fill this gap, we present
a honeypot-based comparative measurement study of page likes garnered via
Facebook advertising and from popular like farms. First, we analyze likes based
on demographic, temporal, and social characteristics, and find that some farms
seem to be operated by bots and do not really try to hide the nature of their
operations, while others follow a stealthier approach, mimicking regular users'
behavior. Next, we look at fraud detection algorithms currently deployed by
Facebook and show that they do not work well to detect stealthy farms which
spread likes over longer timespans and like popular pages to mimic regular
users. To overcome their limitations, we investigate the feasibility of
timeline-based detection of like farm accounts, focusing on characterizing
content generated by Facebook accounts on their timelines as an indicator of
genuine versus fake social activity. We analyze a range of features, grouped
into two main categories: lexical and non-lexical. We find that like farm
accounts tend to re-share content, use fewer words and poorer vocabulary, and
more often generate duplicate comments and likes compared to normal users.
Using relevant lexical and non-lexical features, we build a classifier to
detect like farms accounts that achieves precision higher than 99% and 93%
recall.Comment: To appear in ACM Transactions on Privacy and Security (TOPS
Characterizing Key Stakeholders in an Online Black-Hat Marketplace
Over the past few years, many black-hat marketplaces have emerged that
facilitate access to reputation manipulation services such as fake Facebook
likes, fraudulent search engine optimization (SEO), or bogus Amazon reviews. In
order to deploy effective technical and legal countermeasures, it is important
to understand how these black-hat marketplaces operate, shedding light on the
services they offer, who is selling, who is buying, what are they buying, who
is more successful, why are they successful, etc. Toward this goal, in this
paper, we present a detailed micro-economic analysis of a popular online
black-hat marketplace, namely, SEOClerks.com. As the site provides
non-anonymized transaction information, we set to analyze selling and buying
behavior of individual users, propose a strategy to identify key users, and
study their tactics as compared to other (non-key) users. We find that key
users: (1) are mostly located in Asian countries, (2) are focused more on
selling black-hat SEO services, (3) tend to list more lower priced services,
and (4) sometimes buy services from other sellers and then sell at higher
prices. Finally, we discuss the implications of our analysis with respect to
devising effective economic and legal intervention strategies against
marketplace operators and key users.Comment: 12th IEEE/APWG Symposium on Electronic Crime Research (eCrime 2017
A semantics aware approach to automated reverse engineering unknown protocols
Abstract—Extracting the protocol message format specifica-tions of unknown applications from network traces is important for a variety of applications such as application protocol parsing, vulnerability discovery, and system integration. In this paper, we propose ProDecoder, a network trace based protocol message format inference system that exploits the semantics of protocol messages without the executable code of application protocols. ProDecoder is based on the key insight that the n-grams of protocol traces exhibit highly skewed frequency distribution that can be leveraged for accurate protocol message format inference. In ProDecoder, we first discover the latent relationship among n-grams by first grouping protocol messages with the same semantics and then inferring message formats by keyword based clustering and cluster sequence alignment. We implemented and evaluated ProDecoder to infer message format specifications of SMB (a binary protocol) and SMTP (a textual protocol). Our experimental results show that ProDecoder accurately parses and infers SMB protocol with 100 % precision and recall. For SMTP, ProDecoder achieves approximately 95 % precision and recall